The high growth rate of online tourism platforms has created a huge amount of reviews by customers which include valuable opinions concerning the tourism services. The reviews are unstructured making it hard to analyze them manually. In this study an aspect-based sentiment analysis (ABSA) system for tourism company reviews is proposed by applying machine learning techniques. The given data was pre-processed by cleaning, tokenization, removing stop words, and feature extraction using TF.IDF values. For sentiment classification, four machine learning classifiers were used: Support Vector Machine (SVM), Random Forest (RF), XGBoost, and Gradient Boosting (GB). The models were tested on the basis of the accuracy, precision, recall, and F1 score. Experimental results revealed that all classifiers had an accuracy rate of more than 97% for excellent performance. The best model was Gradient Boosting with an accuracy of 98.47% among the evaluated models. The proposed framework is able to accurately detect the sentiment of the aspects and can offer valuable insights into the customer’s opinion. The results can be used to boost the quality of services offered by the tourism industry, increase customer satisfaction and aid in decision making processes.
Introduction
The rapid growth of tourism and online travel platforms has generated large amounts of customer review data. Traditional sentiment analysis methods classify reviews at the document or sentence level, providing only an overall positive, negative, or neutral sentiment. However, tourism reviews often contain opinions about multiple services such as accommodation, transportation, food, pricing, and staff behavior. As a result, overall sentiment classification may fail to capture customer opinions about specific aspects of their experience.
To address this limitation, the study proposes Aspect-Based Sentiment Analysis (ABSA), a fine-grained sentiment analysis approach that identifies individual aspects within a review and determines the sentiment associated with each aspect separately. This allows tourism companies to understand strengths and weaknesses in specific service areas and make targeted improvements.
The research develops an ABSA framework for tourism company reviews using machine learning techniques including Support Vector Machine (SVM), Random Forest (RF), XGBoost (XGB), and Gradient Boosting (GB). The study's key contributions include:
Development of ABSA models for tourism reviews.
Creation of a dataset containing approximately 4,500 tourism reviews collected from Maharashtra.
Preparation of labeling guidelines for aspect-level sentiment annotation.
Establishment of a baseline framework for future tourism-related ABSA research.
Literature Review
Previous studies laid the foundation for sentiment analysis and opinion mining. Researchers such as Pang & Lee and Liu emphasized the importance of feature-based sentiment analysis. Early aspect-level sentiment analysis was introduced by Hu & Liu, while the SemEval ABSA competitions standardized benchmark datasets. Machine learning methods such as Random Forest, XGBoost, and Gradient Boosting have shown strong performance in sentiment classification. Recent advances, including transformer-based models like BERT, have further improved sentiment analysis accuracy.
Methodology
The study follows a structured methodology:
Dataset Collection: 4,043 tourism reviews were gathered, resulting in 4,587 aspect-level sentiment instances.
Data Preprocessing: Reviews were cleaned through lowercase conversion, punctuation removal, tokenization, stop-word removal, text cleaning, and TF-IDF feature extraction.
Aspect Identification: Nine major tourism-related aspects were analyzed:
Service
Staff Behavior
Hotel & Accommodation
Food Quality
Transport Services
Pricing
Booking Experience
Destination Experience
Cleanliness
Sentiment Classification: TF-IDF features were used to train and evaluate SVM, RF, XGBoost, and Gradient Boosting classifiers for classifying sentiments as positive, negative, or neutral.
Dataset Statistics
Total Reviews: 4,043
Aspect-Level Instances: 4,587
Positive Sentiments: 3,259
Negative Sentiments: 1,025
Neutral Sentiments: 303
The sentiment distribution indicates that most customers reported positive experiences, while fewer reviews expressed negative or neutral opinions.
Significance
The proposed ABSA framework provides detailed insights into customer opinions about specific tourism services rather than only overall satisfaction.
Conclusion
This study aimed to present a framework for sentiment analysis of tourism company reviews in terms of Aspect-Based Sentiment Analysis (ABSA) using machine learning techniques. The review data was pre-processed and made into numerical features with the TF-IDF technique. For sentiment classification, four machine learning models were used: SVM, random forest, XGBoost, and gradient boosting. The results obtained from the experiments demonstrated that the classification accuracy for all the models was high above 97%. Of these, Gradient Boosting gave the best results with an accuracy of 98.47%. The results show the high efficiency of machine learning techniques in detecting customer sentiment from tourism reviews. The proposed framework gives in-depth knowledge about the opinions of customers on various aspects of tourism. The lessons learned may be used to enhance the level of service and satisfaction for tourists. In general, the study showed the significance of aspect level sentiment analysis for tourism review mining and decision making. Transformer and deep learning models could be explored in future to improve classification accuracy.
References
[1] B. Pang and L. Lee, “Opinion Mining and Sentiment Analysis,” Foundations and Trends in Information Retrieval, vol. 2, no. 1–2, pp. 1–135, 2008.
[2] B. Liu, Sentiment Analysis and Opinion Mining. San Rafael, CA, USA: Morgan & Claypool Publishers, 2012.
[3] M. Hu and B. Liu, “Mining and Summarizing Customer Reviews,” in Proc. 10th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Seattle, WA, USA, 2004, pp. 168–177.
[4] M. Pontiki et al., “SemEval-2014 Task 4: Aspect Based Sentiment Analysis,” in Proc. 8th Int. Workshop Semantic Evaluation (SemEval), Dublin, Ireland, 2014, pp. 27–35.
[5] M. Pontiki et al., “SemEval-2015 Task 12: Aspect Based Sentiment Analysis,” in Proc. 9th Int. Workshop Semantic Evaluation (SemEval), Denver, CO, USA, 2015, pp. 486–495.
[6] M. Pontiki et al., “SemEval-2016 Task 5: Aspect Based Sentiment Analysis,” in Proc. 10th Int. Workshop Semantic Evaluation (SemEval), San Diego, CA, USA, 2016, pp. 19–30.
[7] L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
[8] J. H. Friedman, “Greedy Function Approximation: A Gradient Boosting Machine,” Annals of Statistics, vol. 29, no. 5, pp. 1189–1232, 2001.
[9] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016, pp. 785–794.
[10] T. Joachims, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features,” in Proc. European Conf. Machine Learning, Chemnitz, Germany, 1998, pp. 137–142.
[11] E. Cambria and B. White, “Jumping NLP Curves: A Review of Natural Language Processing Research,” IEEE Computational Intelligence Magazine, vol. 9, no. 2, pp. 48–57, 2014.
[12] R. Feldman, “Techniques and Applications for Sentiment Analysis,” Communications of the ACM, vol. 56, no. 4, pp. 82–89, 2013.
[13] S. Kiritchenko, X. Zhu, and S. M. Mohammad, “Sentiment Analysis of Short Informal Texts,” Journal of Artificial Intelligence Research, vol. 50, pp. 723–762, 2014.
[14] Z. Xiang, Z. Schwartz, J. H. Gerdes, and M. Uysal, “What Can Big Data and Text Analytics Tell Us About Hotel Guest Experience and Satisfaction?” International Journal of Hospitality Management, vol. 44, pp. 120–130, 2015.
[15] J. K. Ayeh, N. Au, and R. Law, “Do We Believe in TripAdvisor? Examining Credibility Perceptions and Online Travelers’ Attitude Toward Using User-Generated Content,” Journal of Travel Research, vol. 52, no. 4, pp. 437–452, 2013.
[16] S. Park, N. Lee, and J. Han, “Aspect-Level Sentiment Analysis in Tourism Reviews Using Machine Learning Techniques,” Expert Systems with Applications, vol. 120, pp. 312–324, 2019.
[17] S. Ruder, P. Ghaffari, and J. G. Breslin, “A Hierarchical Model of Reviews for Aspect-Based Sentiment Analysis,” in Proc. EMNLP, Austin, TX, USA, 2016, pp. 999–1005.
[18] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[19] X. Zhang, J. Zhao, and Y. LeCun, “Character-Level Convolutional Networks for Text Classification,” Advances in Neural Information Processing Systems, vol. 28, pp. 649–657, 2015.
[20] A. Vaswani et al., “Attention Is All You Need,” in Advances in Neural Information Processing Systems, vol. 30, Long Beach, CA, USA, 2017, pp. 5998–6008.
[21] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proc. NAACL-HLT, Minneapolis, MN, USA, 2019, pp. 4171–4186.
[22] Y. Liu et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” arXiv preprint arXiv:1907.11692, 2019.
[23] J. Brownlee, Machine Learning Mastery with Python. Melbourne, Australia: Machine Learning Mastery, 2020.
[24] K. Ravi and V. Ravi, “A Survey on Opinion Mining and Sentiment Analysis: Tasks, Approaches and Applications,” Knowledge-Based Systems, vol. 89, pp. 14–46, 2015.
[25] E. Cambria, D. Das, S. Bandyopadhyay, and A. Feraco, A Practical Guide to Sentiment Analysis. Cham, Switzerland: Springer, 2017.